-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RecordBatch
normalization (flattening)
#6758
base: main
Are you sure you want to change the base?
Conversation
RecordBatch
normalization (flattening)
… iterative function for `RecordBatch`. Not sure which one is better currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some questions regarding the implementation of this, since the one example from PyArrow doesn't seem to clarify on the edge cases here. Normalizing the Schema seems fairly straight forward to me, I'm just not sure on
- Whether the iterative or recursive approach is better (or something I missed)
- If
DataType::Struct
is the onlyDataType
that requires flattening. To me, it looks like that's the only one that can contained nestedField
s.
(I'm also not sure if I'm missing something with unwrapping like a List<Struct>
)
Any feedback/help would be appreciated!
Which issue does this PR close?
Closes #6369.
Rationale for this change
Adds normalization (flattening) for
RecordBatch
, with normalization viaSchema
. Based on pandas/pola-rs.What changes are included in this PR?
Are there any user-facing changes?